“Football is two things. It’s blocking and tackling.”
Vince Lombardi
Although this citation from one of the most successful head coaches in the history of the NFL dates back several decades — and the sport of American Football has of course changed since then — tackling remains an integral aspect of the game. In contrast to decision-making scenarios faced by players, such as a quarterback’s selection of a target for a pass, the decision for a tackle is more straightforward: a defense player should always promptly tackle the ball carrier.
When assessing players’ tackles, one is usually interested in a hypothetical scenario: the potential outcome if a player were to miss a tackle. Essentially, this involves quantifying the yards saved by a defensive player. Ideally, albeit impractically, running a play twice — once with the defense player executing the tackle and a second time without — would allow a direct comparison of the yardage gained by the ball carrier, thus enabling to evaluate the impact of the defensive player’s tackle.
Given the impracticability of such a hypothetical scenario, our approach involves approximating it by predicting the yard line of the ongoing play twice. First, we consider the inclusion of the closest defender who executed the tackle, and in a next step, we exclude this player. However, only quantifying the yards saved by a particular tackle does not suffice as an adequate measure of tackle value, due to lack of interpretability on a scale truly relevant to the game outcome. Therefore, we aim to produce a measure of tackle value on the scale of expected points (EP). EP can be viewed as a complicated mapping of the end of play yard line to the expected points in the next play. A sole point prediction of the mean yard line misses uncertainty propagation to the EP scale, such that we aim to produce a full conditional density estimate to calculate the expected points from. The metric derived from this methodology then quantifies the prevented expected points (PEP).
To accurately predict the yard line at the end of any given play it is necessary to create several features derived from the tracking data. More specifically, we conducted the following feature preprocessing:
We transformed the coordinate system by
For each play, we define the x-position of the ball carrier in the last frame as the end-of-play yard line. The response variable we aim to predict is now yards to be gained as the difference of the x-position of the ball carrier in a given frame to the end-of-play yard line.
For all players and the ball carrier we use the features already contained in the tracking data, namely x- and y-coordinates, speed, acceleration, distance covered, orientation and direction.
For all players except the ball carrier we further compute the
For defensive players only, we additionally compute the absolute difference of the defender’s direction and the angle of the shortest segment between the defender and the ball carrier.
Subsequently, we order all players (in each frame) with respect to their euclidean distance to the ball carrier and standardize all features.
For the identification of tackle events, we do not rely on the event column. Instead, we define as the tackle event the frame in which the distance of the tackler (whom we derive from the tackle event data set) to the ball carrier is minimal within a given play.
Schreiben welche plays wir rausgenommen haben und warum. Ja, würde ich kurz
…add an example play to illustrate the “what if” scenario… Hier die end of play yardline und “yards to be gained” einzeichnen.
Our analysis comprises four steps:
We train a model designed to predict the yards to be gained from which we can calculate the end-of-play yard line (see Yurko et al., 2020). The model uses the previously described features, only including the ten closest defenders and should account for potential non-linear and interaction effects. The time-series nature of the data suggests the usage of deep learning architectures such as transformers or LSTMs. However, we aim to go beyond point estimates for the yard line at the end of the play. Hence, we set up a conditional density estimator \(\hat{f}(y \mid x)\) which allows for adequate uncertainty propagation in the following steps. Thus, we opt for a middle-ground solution between accuracy in mean prediction and uncertainty quantification and consider a random forest comprising 1000 individual trees. Especially in our use case, modeling the uncertainty is important as the variance of the end-of-play yard line differs substantially between varying game situations.
RMSE und MAE reporten.
For each tackle, we systematically remove the closest defender at the moment of the tackle and replace the features with those of the second closest defender. Further on, we replace the second closest with the third closest, and so on. In this way, we come up with a prediction for a hypothetical “what if the tackle would be missed” scenario which then can be compared to the real existing tackle.
Using the trained random forest, we predict the end-of-play yard line with 1000 trees. Using a kernel density estimator for visualization, we can plot the dynamically evolving conditional density estimation withing any given play.
For the purpose of illustration, we present a specific example play. The video below shows a successful passing play from the Detroit Lions against the Miami Dolphins. After a completed pass, the receiver (in this case Tight End TJ Hockenson) is able to gain a substantial amount of yards by evading a tackle and is finally stopped only 12 yards before the endzone.
Below we display an animation of that same play (in the transformed coordinate system). At each frame, we add the conditional density of the yards to be gained from our model. There are a few observations: First at beginning of the play the density is rather narrow, because the model expects a tackle from the closest defender. As soon as TJ Hockenson is able to evade the first tackle, the density changes. The variance of the yards to be gained distribution increases and we even observe a bimodal distribution with a lot of mass at the endzone. Finally, at the time of tackle the distribution becomes quite narrow again, as we only expect the runner to make a few more yards.
Formulierung in obigen text sollte noch verbessert werden. War nur ein erster Ansatz< description